Search CORE

arXiv.org e-Print Archive

UCL Discovery

Using Neural Networks for Relation Extraction from Biomedical Literature

Author: A Koike
A Lamurias
A Lamurias
A Lamurias
A Lamurias
A Singhal
AV Aho
B Xu
CD Manning
CH Alves
D Westergaard
D Zhou
E Guresen
F Rinaldi
HC Wang
HM Müller
J Hastings
L Aroyo
M Ashburner
MY Kim
N Ma
N Peng
P Goyal
P Zweigenbaum
PN Robinson
Q Li
QL Nguyen
S HayKin
S Hochreiter
TR Gruber
W Wang
WWM Fleuren
Y Hao
Y Luo
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/09/2020
Field of study

Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1

Tumor taxonomy for the developmental lineage classification of neoplasms

Author: BL Humphreys
DM Baorto
E Mayr
F Marti'n-Sanchez
JJ Berman
JJ Berman
Jules J Berman
K Ahmed
LD Stein
MA Harris
MN Cantor
MY Galperin
P Zweigenbaum
PA Covitz
SH Walsh
Publication venue: BioMed Central
Publication date: 01/11/2004
Field of study

BACKGROUND: The new "Developmental lineage classification of neoplasms" was described in a prior publication. The classification is simple (the entire hierarchy is described with just 39 classifiers), comprehensive (providing a place for every tumor of man), and consistent with recent attempts to characterize tumors by cytogenetic and molecular features. A taxonomy is a list of the instances that populate a classification. The taxonomy of neoplasia attempts to list every known term for every known tumor of man. METHODS: The taxonomy provides each concept with a unique code and groups synonymous terms under the same concept. A Perl script validated successive drafts of the taxonomy ensuring that: 1) each term occurs only once in the taxonomy; 2) each term occurs in only one tumor class; 3) each concept code occurs in one and only one hierarchical position in the classification; and 4) the file containing the classification and taxonomy is a well-formed XML (eXtensible Markup Language) document. RESULTS: The taxonomy currently contains 122,632 different terms encompassing 5,376 neoplasm concepts. Each concept has, on average, 23 synonyms. The taxonomy populates "The developmental lineage classification of neoplasms," and is available as an XML file, currently 9+ Megabytes in length. A representation of the classification/taxonomy listing each term followed by its code, followed by its full ancestry, is available as a flat-file, 19+ Megabytes in length. The taxonomy is the largest nomenclature of neoplasms, with more than twice the number of neoplasm names found in other medical nomenclatures, including the 2004 version of the Unified Medical Language System, the Systematized Nomenclature of Medicine Clinical Terminology, the National Cancer Institute's Thesaurus, and the International Classification of Diseases Oncolology version. CONCLUSIONS: This manuscript describes a comprehensive taxonomy of neoplasia that collects synonymous terms under a unique code number and assigns each tumor to a single class within the tumor hierarchy. The entire classification and taxonomy are available as open access files (in XML and flat-file formats) with this article

A bioinformatics knowledge discovery in text application for grid computing

Author: A Hotho
AM Cohen
D Talia
EG Talbi
Gianfranco Tarricone
Giuseppe Mastronardi
H Shatkay
I Foster
IH Witten
M Castellano
M Castellano
Marcello Castellano
P Zweigenbaum
PC Carvalho
R Mooney
RC Bunescu
Roberto Bellotti
U Leser
UM Fayyad
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.</p

Archivio istituzionale della ricerca - Università di Bari

Is searching full text more effective than searching abstracts?

Author: A Singhal
A Trotman
B Larsen
B Rafkind
B Sigurbjörnsson
C Clarke
CW Cleverdon
CW Gay
D Demner-Fushman
D Metzler
DK Harman
G Salton
G Salton
G Salton
H Shatkay
H Yu
H Yu
H Yu
I Tbahriti
J Dean
J Kamps
Jimmy Lin
JM Ponte
JP Callan
K Seki
K Sparck Jones
L Hunter
LA Barroso
M Kaszkiel
M Krallinger
M Lalmas
M Wang
MA Hearst
MJ Schuemie
P Ogilvie
P Zweigenbaum
P Zweigenbaum
PK Shah
R Wilkinson
S Brin
S Ghemawat
S Tellex
SE Robertson
SE Robertson
T Elsayed
WJ Wilbur
WR Hersh
X Liu
Z Kou
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE® abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: <it>bm25 </it>and the ranking algorithm implemented in the open-source Lucene search engine. Results Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles. Conclusion Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p

Digital Repository at the University of Maryland

Clinical narrative analytics challenges

Author: A Coden
A Rodríguez-González
A Rodríguez-González
AA Thomas
BL Humphreys
C Friedman
C Friedman
C Friedman
D Ferrucci
DA Hanauer
G Hripcsak
GK Savova
M Taboada
O Ben-Assuli
P Zweigenbaum
PM Pietrzyk
QT Zeng
R Costumero
R Costumero
R Costumero
SM Meystre
Y Ji
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Precision medicine or evidence based medicine is based on the extraction of knowledge from medical records to provide individuals with the appropriate treatment in the appropriate moment according to the patient features. Despite the efforts of using clinical narratives for clinical decision support, many challenges have to be faced still today such as multilinguarity, diversity of terms and formats in different services, acronyms, negation, to name but a few. The same problems exist when one wants to analyze narratives in literature whose analysis would provide physicians and researchers with highlights. In this talk we will analyze challenges, solutions and open problems and will analyze several frameworks and tools that are able to perform NLP over free text to extract medical entities by means of Named Entity Recognition process. We will also analyze a framework we have developed to extract and validate medical terms. In particular we present two uses cases: (i) medical entities extraction of a set of infectious diseases description texts provided by MedlinePlus and (ii) scales of stroke identification in clinical narratives written in Spanish

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Semi-automated screening of biomedical citations for systematic reviews

Author: A Aronson
A Blum
A Cohen
A Wilcox
B Settles
B Wallace
Byron C Wallace
C Blake
C Cole
C Counsell
Carla Brodley
Chih-Chung
Christopher H Schmid
CJL Chih-Wei Hsu
D Chen
DD Lewis
E Perrin
F Camous
G Druck
G Schohn
H Kilicoglu
Joseph Lau
K Brinker
KS Goh
KS Jones
L Breiman
L Hunter
M Barza
M Chung
M Yetisgen-Yildiz
N Japkowicz
P Wheeler
P Zweigenbaum
S Dasgupta
S Ertekin
S Kotsiantis
S Tong
T Joachims
T Terasawa
Thomas A Trikalinos
VN Vapnik
W Yu
Y Aphinyanaphongs
YAC Aphinyanaphongs
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Extracting causal relations on HIV drug resistance from literature

Author: A Koike
AM Cohen
Breanndán Ó Nualláin
C Giles
Charles A Boucher
D Klein
D Zhou
DR Douglas
F Horn
F Leitner
F Rinaldi
G Erkan
H Jang
H Saigo
IH Witten
J Saric
J Vercauteren
JG Liao
JH Kim
K Fundel
LJ Jensen
M Abulaish
M Huang
MY Kim
N Daraselia
O Sanchez-Graillet
P Zweigenbaum
Peter MA Sloot
Quoc-Chinh Bui
R Chowdhary
R Malik
RA Erhardt
S Ananiadou
S Katrenko
S Kim
T Lengauer
VI Torvik
Y Miyao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In HIV treatment it is critical to have up-to-date resistance data of applicable drugs since HIV has a very high rate of mutation. These data are made available through scientific publications and must be extracted manually by experts in order to be used by virologists and medical doctors. Therefore there is an urgent need for a tool that partially automates this process and is able to retrieve relations between drugs and virus mutations from literature. Results In this work we present a novel method to extract and combine relationships between HIV drugs and mutations in viral genomes. Our extraction method is based on natural language processing (NLP) which produces grammatical relations and applies a set of rules to these relations. We applied our method to a relevant set of PubMed abstracts and obtained 2,434 extracted relations with an estimated performance of 84% for F-score. We then combined the extracted relations using logistic regression to generate resistance values for each <drug, mutation> pair. The results of this relation combination show more than 85% agreement with the Stanford HIVDB for the ten most frequently occurring mutations. The system is used in 5 hospitals from the Virolab project <url>http://www.virolab.org</url> to preselect the most relevant novel resistance data from literature and present those to virologists and medical doctors for further evaluation. Conclusions The proposed relation extraction and combination method has a good performance on extracting HIV drug resistance data. It can be used in large-scale relation extraction experiments. The developed methods can also be applied to extract other type of relations such as gene-protein, gene-disease, and disease-mutation.</p

Erasmus University Digital Repository

EUR Research Repository

DR-NTU (Digital Repository of NTU)

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Enriching a biomedical event corpus with meta-knowledge annotation

Author: A de Waard
A de Waard
A Rzhetsky
AM Cohen
AS Yeh
B Medlock
F Lisacek
H Kilicoglu
H Langer
H Shatkay
J Cohen
J Ding
J Kim
John McNaught
JT Kim
K Hirohata
K Hyland
K Hyland
K Hyland
K Oda
KB Cohen
L Hoye
L McKnight
M Ashburner
M Liakata
M Light
ME Califf
O Sanchez-Graillet
P Ruch
P Thompson
P Thompson
P Zweigenbaum
P Zweigenbaum
Paul Thompson
R Bunescu
R Nawaz
Raheel Nawaz
S Ananiadou
S Soderland
S Teufel
S Teufel
Sophia Ananiadou
V Rizomilioti
V Vincze
VL Rubin
WJ Wilbur
Y Miyao
Y Mizuta
Á Sándor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background: Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event.Results: We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa.Conclusion: By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event. © 2011 Thompson et al; licensee BioMed Central Ltd

E-space: Manchester Metropolitan University's Research Repository

The University of Manchester - Institutional Repository

Mining clinical relationships from patient narratives

Author: A Rector
A Roberts
A Roberts
A Roberts
Angus Roberts
C Blaschke
C Friedman
C Giuliano
C Grover
C Nédellec
CB Ahlers
D Klein
D Lindberg
D Zelenko
Defense Advanced Research Projects Agency
G Doddington
G Zhou
H Cunningham
H Harkema
J Pustejovsky
J Thomas
K Fundel
M Goadrich
Mark Hepple
N Chinchor
N Sager
P Zweigenbaum
R Bunescu
R Gaizauskas
RC Bunescu
Robert Gaizauskas
S Katrenko
S Miller
S Pakhomov
T Rindflesch
T Wang
TC Rindflesch
U Hahn
W Chapman
Y Li
Y Lussier
Yikun Guo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records in order to support clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning (ML) approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to the extraction of clinical relationships. Results We have designed and implemented an ML-based system for relation extraction, using support vector machines, and trained and tested it on a corpus of oncology narratives hand-annotated with clinically important relationships. Over a class of seven relation types, the system achieves an average F1 score of 72%, only slightly behind an indicative measure of human inter annotator agreement on the same task. We investigate the effectiveness of different features for this task, how extraction performance varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships. Conclusion We have shown that it is possible to extract important clinical relationships from text, using supervised statistical ML techniques, at levels of accuracy approaching those of human annotators. Given the importance of relation extraction as an enabling technology for text mining and given also the ready adaptability of systems based on our supervised learning approach to other clinical relationship extraction tasks, this result has significance for clinical text mining more generally, though further work to confirm our encouraging results should be carried out on a larger sample of narratives and relationship types